Request


This report includes a description and the results of the analyses requested by Elisa Jentho at 08/06/2021:

1 - Do functional enrichment analysis using only the functional database GO:BP.

2 - Re-do heatmaps of differential expression genes using log2 fold change values.

3 - Plot additional markers in the integrated overlay UMAPs.






About this report


The R version used was 3.6.3 (R Core Team 2020). This report was built with the rmarkdown R package (v.2.5) (Allaire et al. 2020; Xie, Allaire, and Grolemund 2018; Xie, Dervieux, and Riederer 2020). See the full list of packages and versions used at the end of this report - R packages used and respective versions.

The integrated 10x Lox2 and Cre3 samples processed previously were imported to plot the markers upon the UMAPs with the Seurat R package (v.4.0.0) (Satija et al. 2015; Butler et al. 2018; Stuart et al. 2019; Hao et al. 2020).

The figures and tables displayed through the report can be download in pdf and tab-separated format by clicking on the bottom left Download plot and Download table button that appears after each plot and table, respectively.

## Set seed 
set.seed(seed = 1024) # to keep reproducibility
## Import packages 

library("dplyr", quietly = TRUE)
library("DT", quietly = TRUE) # package to print nice in the html report 
library("Seurat", quietly = TRUE)
library("gplots", quietly = TRUE)
library("Matrix", quietly = TRUE)
library("ggplot2", quietly = TRUE)
library("tidyr", quietly = TRUE)
library("readxl", quietly = TRUE)
library("gprofiler2", quietly = TRUE)
library("ComplexHeatmap", quietly = TRUE)






Data analysis


Import data


The Cre3 and Lox2 integrated samples were imported.

## Import Seurat object with the re3 & Lox2 integrated samples

# import obj integrated
seu <- readRDS(file = "../results/int_28_05_21/R_objects/seu.rds")

It was also imported the differential gene expression tables obtained before.

## Import DGE tables

dge_tables <- "../results/int_28_05_21/tables/dge_tables"
dge2import <- list.files(path = dge_tables, pattern = "clt_", full.names = TRUE)
names(dge2import) <- lapply(basename(dge2import), function(x) strsplit(x = x, split = "_")[[1]][1:2] %>% 
                              unlist(.) %>% paste(., collapse = "_")) %>% unlist(.)
dge <- lapply(dge2import, function(x) read.table(file = x, header = TRUE, sep = "\t", 
                                                 stringsAsFactors = FALSE))

It was also imported the markers to plot in the overlay integrated UMAPs as well as the markers for cluster 8 from the integrated Cre3 sample alone ("PCHAS-130590", "PCHAS-130630", "PCHAS-130770", "PCHAS-131600", "PCHAS-134500").

## Plot overlay markers for cluster 8 
markers_clt8 <- c("PCHAS-130590", "PCHAS-130630", "PCHAS-130770", "PCHAS-131600", "PCHAS-134500")

# import new markers
markers2import <- "../data/markers_to_plot_08_06_21/MArkers.xlsx"
markers_all <- read_excel(path = markers2import, col_names = FALSE)
colnames(markers_all) <- "Genes"

The new markers to plot are highlighted below.







Functional enrichment: GO:BP


In order to perform a functional enrichment analysis between the differentially expressed up- and down-regulated genes obtained from each pairwise comparison between each cluster across Cre3 vs Lox2, it was used the gprofiler2 R package (v.0.2.0) (Kolberg and Raudvere 2020), an interface to the g:Profiler web browser tool. The function gost() was applied in order to perform enrichment based on the list of up- or down-regulated genes, between each pairwise comparison (independently), against the annotated genes (domain_scope = “annotated”) of the organism Plasmodium chabaudi (organism = "pchabaudi"). The gene lists were ordered by increasing adjusted p-value (ordered_query = TRUE) in order to generate a GSEA (Gene Set Enrichment Analysis) style p-values. This allows to start the enrichment testing from the top most biological relevant genes with subsequent tests involving larger sets of genes. In addition, only statistically significant (user_threshold = 0.05) enriched functions are returned (significant = TRUE) after multiple testing correction with the default method g:SCS (correction_method = "g_SCS"). Finally evidence codes are added to the final result (evcodes = TRUE). The functional enrichment analysis was performed only against the GO:BP functional database by providing the option sources = "GO:BP".


It was tested functional enrichment for all clusters, with the exception of cluster 8, because it lacks differential gene expression data. It was given individually a list of up- (avg_log2FC > 0) or downregulated (avg_log2FC < 0) genes ranked (from the most significant to the lowest) by adjusted p-values (only genes with an adjusted p-value < 0.05).

The result of this analysis is a plot and a table for each up- and down-regulated gene list from each cluster pairwise comparison across Cre3 vs Lox2, that shows the functional enrichment based on the -log10 of the adjusted p-value across different functional databases (see above). Only significant functions are reported. The table contains the following columns/information (the information below was copied entirely from the original vignette here):


  • query: the name of the input query which by default is the order of query with the prefix “query_”. This can be changed by using a named list input.

  • significant: indicator for statistically significant results

  • p_value: hypergeometric p-value after correction for multiple testing

  • term_size: number of genes that are annotated to the term

  • query_size: number of genes that were included in the query. This might be different from the size of the original list if:

    • any genes were mapped to multiple Ensembl gene IDs

    • any genes failed to be mapped to Ensembl gene IDs

    • the parameter ordered_query = TRUE and the optimal cutoff for the term was found before the end of the query

    • the domain_scope was set to “annotated” or “custom”

  • intersection_size: the number of genes in the input query that are annotated to the corresponding term

  • precision: the proportion of genes in the input list that are annotated to the function (defined as intersection_size/query_size)

  • recall: the proportion of functionally annotated genes that the query recovers (defined as intersection_size/term_size)

  • term_id: unique term identifier (e.g GO:0005005)

  • source: the abbreviation of the data source for the term (e.g. GO:BP)

  • term_name: the short name of the function

  • effective_domain_size: the total number of genes “in the universe” used for the hypergeometric test

  • source_order: numeric order for the term within its data source (this is important for drawing the results)

  • parents: list of term IDs that are hierarchically directly above the term. For non-hierarchical data sources this points to an artificial root node.


In the bubble plots represented below, functions that have a -log10(adjusted p-value)>16 will be capped; i.e., they will appear close to the vertical dashed black line. All these results reported below were run at 09/06/2021 using the archived version of the gprofiler2 server - Ensembl 102, Ensembl Genomes 49 (database built on 2020-12-15): https://biit.cs.ut.ee/gprofiler_archive3/e102_eg49_p15/gost.

## Get up and down gene lists and parse them

reg_gene_list <- list()
for ( clt in names(dge) ) { # select dge by cluster and retrieve name
  sub_df <- dge[[clt]]
  #sub_df[,"Geneid"] <- rownames(sub_df)
  gene_ids_up <- sub_df %>% 
    filter(p_val_adj < 0.05 & avg_log2FC > 0) %>% 
    arrange(p_val_adj) %>% 
    pull(Geneid)
  gene_ids_down <- sub_df %>% 
    filter(p_val_adj < 0.05 & avg_log2FC < 0) %>% 
    arrange(p_val_adj) %>% 
    pull(Geneid)  
  reg_gene_list[[clt]] <- list()
  reg_gene_list[[clt]][["up"]] <- gsub(pattern = "PCHAS-", replacement = "PCHAS_", x = gene_ids_up)
  reg_gene_list[[clt]][["down"]] <- gsub(pattern = "PCHAS-", replacement = "PCHAS_", x = gene_ids_down)
}

# create folders.
r_object_folder <- "../results/func_enrich_plots_09_06_21/R_objects"
if ( ! dir.exists(r_object_folder) ) dir.create(r_object_folder, recursive = TRUE)
func_enrich_folder <- "../results/func_enrich_plots_09_06_21/tables/functional_enrichment"
if ( ! dir.exists(func_enrich_folder) ) dir.create(func_enrich_folder, recursive = TRUE)
### Functional enrichment of DEG 

## Run gprofiler2
func_enrich <- list()
set_base_url("https://biit.cs.ut.ee/gprofiler_archive3/e102_eg49_p15")
for ( clt in names(reg_gene_list) ){ # loop over list and do functional enrichment
  #print(get_base_url())
  func_enrich[[clt]] <- list()
  set.seed(1024)
  func_enrich[[clt]][["up"]] <- gost(query = reg_gene_list[[clt]][["up"]], 
                                     organism = "pchabaudi", ordered_query = TRUE, 
                                     multi_query = FALSE, significant = TRUE, exclude_iea = FALSE, 
                                     measure_underrepresentation = FALSE, evcodes = TRUE, 
                                     user_threshold = 0.05, correction_method = "g_SCS", 
                                     domain_scope = "annotated", custom_bg = NULL, 
                                     numeric_ns = "", sources = "GO:BP")
  if ( ! is.null(func_enrich[[clt]][["up"]]$result) ) {
    sub_df_up <- func_enrich[[clt]][["up"]]$result %>% 
    apply(X = ., MARGIN = 2, FUN =  function(x) as.character(x)) 
    write.table(x = sub_df_up, 
                file = paste(func_enrich_folder, paste0(clt, "_up_functional_enrichment_table.tsv"), 
                             sep = "/"), 
                quote = FALSE, sep = "\t", row.names = FALSE, col.names = TRUE)
  }
  set.seed(1024)
  func_enrich[[clt]][["down"]] <- gost(query = reg_gene_list[[clt]][["down"]], 
                                       organism = "pchabaudi", ordered_query = TRUE, 
                                       multi_query = FALSE, significant = TRUE, exclude_iea = FALSE, 
                                       measure_underrepresentation = FALSE, evcodes = TRUE, 
                                       user_threshold = 0.05, correction_method = "g_SCS", 
                                       domain_scope = "annotated", custom_bg = NULL, 
                                       numeric_ns = "", sources = "GO:BP")
  if ( ! is.null(func_enrich[[clt]][["down"]]$result) ) {
    sub_df_down <- func_enrich[[clt]][["down"]]$result %>% 
    apply(X = ., MARGIN = 2, FUN =  function(x) as.character(x)) 
    write.table(x = sub_df_down, 
                file = paste(func_enrich_folder, paste0(clt, "_down_functional_enrichment_table.tsv"), 
                             sep = "/"), 
                quote = FALSE, sep = "\t", row.names = FALSE, col.names = TRUE)
  }
}

# Create R object
saveRDS(object = func_enrich, file = paste(r_object_folder, "func_enrich.rds", sep = "/"))
# Import functional enrichment object to avoid to have to run the API app
func_enrich <- readRDS(file = paste(r_object_folder, "func_enrich.rds", sep = "/"))


Among the clusters tested, it was only found significant functional enrichment pathways for cluster 3 (down), 4 (up) and 7 (up).

## Functional enrichments - barplots

# Create folder to save results
func_enrich_barplots_folder <- "../results/func_enrich_plots_09_06_21/plots/func_enrich_barplots"
if ( ! dir.exists(func_enrich_barplots_folder) ) dir.create(func_enrich_barplots_folder, recursive = TRUE)

# Plot and save data
func_enrich_list <- list()
for ( clt in names(func_enrich) ) { # loop over cluster
  if ( ! is.null(names(func_enrich[[clt]])) ) { # if cluster is not NULL
    func_enrich_list[[clt]] <- list()
    for ( reg in names(func_enrich[[clt]]) ) { # loop over regulation: 'up' and 'down'
      if ( reg == "up" ) {
        color <- "#E64B35FF"
      } else {
        color <- "#4DBBD5FF"
      }
      # Plot barplot of functional enrichment 
      func_enrich_list[[clt]][[reg]] <- func_enrich[[clt]][[reg]]$result %>% 
        arrange(source, p_value) %>%
        mutate(term_name = factor(term_name, levels = rev(unique(term_name)))) %>%
        ggplot(data = ., mapping = aes(x = term_name, y = -log10(p_value))) + #, fill = source)) + 
        geom_bar(stat = "identity", fill = color) + 
        #ggsci::scale_fill_npg(name = "Source") +
        #facet_grid( source ~ . , scales = "free_y", space = "free") +
        coord_flip() + 
        theme_bw() + 
        ylab("-log10 (adjusted p-value)") + 
        xlab("Term name") + 
        theme(axis.text = element_text(size = 10, color = "black"))
      # save
      ggsave(filename = paste(func_enrich_barplots_folder, paste0(clt, "_", reg, "_func_enrich_barplot.pdf"), sep = "/"), 
             plot = func_enrich_list[[clt]][[reg]], width = 6, height = 4)
    }
  }
}



  • Cluster 3 (down)


print(func_enrich_list$clt_3$down)




  • Cluster 4 (up)


print(func_enrich_list$clt_4$up)




  • Cluster 7 (up)


print(func_enrich_list$clt_7$up)







DGE: Cre3 vs Lox2 (heatmaps)


A gene was considered differentially expressed (and plotted) if it has an absolute log2FC>0 and an adjusted p-value<0.05. It was used the R package ComplexHeatmap (v.2.2.0) (Gu, Eils, and Schlesner 2016). The same color gradient scale ranging from -4 to 4 was giving to all the heatmaps in order to make them comparable. For the heatmaps with more genes differentially expressed, only the top 10 up- and downregulated genes were labeled.


  • DGE heatmap: Cre3 vs Lox2 - cluster 0
# path to save plots
dge_folder <- "../results/func_enrich_plots_09_06_21/plots/dge"
if ( ! dir.exists(dge_folder) ) dir.create(dge_folder)

## DGE - heatmaps - clt 0

# data
cluster <- "clt_0"
sub_df <- dge[[cluster]] %>% 
  filter(p_val_adj<0.05 & abs(avg_log2FC)>0) %>% 
  arrange(desc(avg_log2FC))
row.names(sub_df) <- sub_df$Geneid
sub_df <- sub_df[,"avg_log2FC", drop = FALSE]
row_annot <- rowAnnotation(foo = anno_mark(at = c(1:2,45:54), 
                                           labels = row.names(sub_df)[c(1:2,45:54)], 
                                           labels_gp = gpar(fontsize = 8.5)))
col_fun <- circlize::colorRamp2(c(-4, 0, 4), c("blue", "white", "red"))

# save
set.seed(1024)
heat_clt0 <- Heatmap(matrix = as.matrix(sub_df), name = "Average\nlog2FC", 
                     cluster_columns = FALSE, cluster_rows = FALSE, 
                     show_column_names = FALSE, col = col_fun, 
                     show_row_names = FALSE,
                     right_annotation = row_annot)
set.seed(1024)
pdf(file = paste(dge_folder, paste0(cluster, "_dge_heatmap.pdf"), sep = "/"), 
    width = 3, height = 0.1 * nrow(sub_df))
print(heat_clt0)
dev.off()
## png 
##   2
# print
set.seed(1024)
print(heat_clt0)




  • DGE heatmap: Cre3 vs Lox2 - cluster 1
## DGE - heatmaps - clt 1

# data
cluster <- "clt_1"
sub_df <- dge[[cluster]] %>% 
  filter(p_val_adj<0.05 & abs(avg_log2FC)>0) %>% 
  arrange(desc(avg_log2FC))
row.names(sub_df) <- sub_df$Geneid
sub_df <- sub_df[,"avg_log2FC", drop = FALSE]
row_annot <- rowAnnotation(foo = anno_mark(at = c(1:10,44:53), 
                                           labels = row.names(sub_df)[c(1:10,44:53)], 
                                           labels_gp = gpar(fontsize = 8.5)))
col_fun <- circlize::colorRamp2(c(-4, 0, 4), c("blue", "white", "red"))

# save
set.seed(1024)
heat_clt1 <- Heatmap(matrix = as.matrix(sub_df), name = "Average\nlog2FC", 
                     cluster_columns = FALSE, cluster_rows = FALSE, 
                     show_column_names = FALSE, col = col_fun, 
                     show_row_names = FALSE,
                     right_annotation = row_annot)
set.seed(1024)
pdf(file = paste(dge_folder, paste0(cluster, "_dge_heatmap.pdf"), sep = "/"), 
    width = 3, height = 0.1 * nrow(sub_df))
print(heat_clt1)
dev.off()
## png 
##   2
# print
set.seed(1024)
print(heat_clt1)




  • DGE heatmap: Cre3 vs Lox2 - cluster 2
## DGE - heatmaps - clt 2

# data
cluster <- "clt_2"
sub_df <- dge[[cluster]] %>% 
  filter(p_val_adj<0.05 & abs(avg_log2FC)>0) %>% 
  arrange(desc(avg_log2FC))
row.names(sub_df) <- sub_df$Geneid
sub_df <- sub_df[,"avg_log2FC", drop = FALSE]
row_annot <- rowAnnotation(foo = anno_mark(at = c(1:9,55:64), 
                                           labels = row.names(sub_df)[c(1:9,55:64)], 
                                           labels_gp = gpar(fontsize = 8.5)))
col_fun <- circlize::colorRamp2(c(-4, 0, 4), c("blue", "white", "red"))

# save
set.seed(1024)
heat_clt2 <- Heatmap(matrix = as.matrix(sub_df), name = "Average\nlog2FC", 
                     cluster_columns = FALSE, cluster_rows = FALSE, 
                     show_column_names = FALSE, col = col_fun, 
                     show_row_names = FALSE,
                     right_annotation = row_annot)
set.seed(1024)
pdf(file = paste(dge_folder, paste0(cluster, "_dge_heatmap.pdf"), sep = "/"), 
    width = 3, height = 0.1 * nrow(sub_df))
print(heat_clt2)
dev.off()
## png 
##   2
# print
set.seed(1024)
print(heat_clt2)




  • DGE heatmap: Cre3 vs Lox2 - cluster 3
## DGE - heatmaps - clt 3

# data
cluster <- "clt_3"
sub_df <- dge[[cluster]] %>% 
  filter(p_val_adj<0.05 & abs(avg_log2FC)>0) %>% 
  arrange(desc(avg_log2FC))
row.names(sub_df) <- sub_df$Geneid
sub_df <- sub_df[,"avg_log2FC", drop = FALSE]
row_annot <- rowAnnotation(foo = anno_mark(at = c(1:10,196:205), 
                                           labels = row.names(sub_df)[c(1:10,196:205)], 
                                           labels_gp = gpar(fontsize = 8.5)))
col_fun <- circlize::colorRamp2(c(-4, 0, 4), c("blue", "white", "red"))

# save
set.seed(1024)
heat_clt3 <- Heatmap(matrix = as.matrix(sub_df), name = "Average\nlog2FC", 
                     cluster_columns = FALSE, cluster_rows = FALSE, 
                     show_column_names = FALSE, col = col_fun, 
                     show_row_names = FALSE,
                     right_annotation = row_annot)
set.seed(1024)
pdf(file = paste(dge_folder, paste0(cluster, "_dge_heatmap.pdf"), sep = "/"), 
    width = 3, height = 0.1 * nrow(sub_df))
print(heat_clt3)
dev.off()
## png 
##   2
# print
set.seed(1024)
print(heat_clt3)




  • DGE heatmap: Cre3 vs Lox2 - cluster 4
## DGE - heatmaps - clt 4

# data
cluster <- "clt_4"
sub_df <- dge[[cluster]] %>% 
  filter(p_val_adj<0.05 & abs(avg_log2FC)>0) %>% 
  arrange(desc(avg_log2FC))
row.names(sub_df) <- sub_df$Geneid
sub_df <- sub_df[,"avg_log2FC", drop = FALSE]
row_annot <- rowAnnotation(foo = anno_mark(at = c(1:10,84:93), 
                                           labels = row.names(sub_df)[c(1:10,84:93)], 
                                           labels_gp = gpar(fontsize = 8.5)))
col_fun <- circlize::colorRamp2(c(-4, 0, 4), c("blue", "white", "red"))

# save
set.seed(1024)
heat_clt4 <- Heatmap(matrix = as.matrix(sub_df), name = "Average\nlog2FC", 
                     cluster_columns = FALSE, cluster_rows = FALSE, 
                     show_column_names = FALSE, col = col_fun, 
                     show_row_names = FALSE,
                     right_annotation = row_annot)
set.seed(1024)
pdf(file = paste(dge_folder, paste0(cluster, "_dge_heatmap.pdf"), sep = "/"), 
    width = 3, height = 0.1 * nrow(sub_df))
print(heat_clt4)
dev.off()
## png 
##   2
# print
set.seed(1024)
print(heat_clt4)




  • DGE heatmap: Cre3 vs Lox2 - cluster 5
## DGE - heatmaps - clt 5

# data
cluster <- "clt_5"
sub_df <- dge[[cluster]] %>% 
  filter(p_val_adj<0.05 & abs(avg_log2FC)>0) %>% 
  arrange(desc(avg_log2FC))
row.names(sub_df) <- sub_df$Geneid
sub_df <- sub_df[,"avg_log2FC", drop = FALSE]
# row_annot <- rowAnnotation(foo = anno_mark(at = c(1:6,9:18), 
#                                            labels = row.names(sub_df)[c(1:6,9:18)], 
#                                            labels_gp = gpar(fontsize = 8.5)))
col_fun <- circlize::colorRamp2(c(-4, 0, 4), c("blue", "white", "red"))

# save
set.seed(1024)
heat_clt5 <- Heatmap(matrix = as.matrix(sub_df), name = "Average\nlog2FC", 
                     cluster_columns = FALSE, cluster_rows = FALSE, 
                     show_column_names = FALSE, col = col_fun, 
                     show_row_names = TRUE, row_names_gp = gpar(fontsize = 6.5))#,
                     #right_annotation = row_annot)
set.seed(1024)
pdf(file = paste(dge_folder, paste0(cluster, "_dge_heatmap.pdf"), sep = "/"), 
    width = 3, height = 0.1 * nrow(sub_df))
print(heat_clt5)
dev.off()
## png 
##   2
# print
set.seed(1024)
print(heat_clt5)




  • DGE heatmap: Cre3 vs Lox2 - cluster 6
## DGE - heatmaps - clt 6

# data
cluster <- "clt_6"
sub_df <- dge[[cluster]] %>% 
  filter(p_val_adj<0.05 & abs(avg_log2FC)>0) %>% 
  arrange(desc(avg_log2FC))
row.names(sub_df) <- sub_df$Geneid
sub_df <- sub_df[,"avg_log2FC", drop = FALSE]
# row_annot <- rowAnnotation(foo = anno_mark(at = c(1:6,9:18), 
#                                            labels = row.names(sub_df)[c(1:6,9:18)], 
#                                            labels_gp = gpar(fontsize = 8.5)))
col_fun <- circlize::colorRamp2(c(-4, 0, 4), c("blue", "white", "red"))

# save
set.seed(1024)
heat_clt6 <- Heatmap(matrix = as.matrix(sub_df), name = "Average\nlog2FC", 
                     cluster_columns = FALSE, cluster_rows = FALSE, 
                     show_column_names = FALSE, col = col_fun, 
                     show_row_names = TRUE, row_names_gp = gpar(fontsize = 6.5))#,
                     #right_annotation = row_annot)
set.seed(1024)
pdf(file = paste(dge_folder, paste0(cluster, "_dge_heatmap.pdf"), sep = "/"), 
    width = 3, height = 0.1 * nrow(sub_df))
print(heat_clt6)
dev.off()
## png 
##   2
# print
set.seed(1024)
print(heat_clt6)




  • DGE heatmap: Cre3 vs Lox2 - cluster 7
## DGE - heatmaps - clt 7

# data
cluster <- "clt_7"
sub_df <- dge[[cluster]] %>% 
  filter(p_val_adj<0.05 & abs(avg_log2FC)>0) %>% 
  arrange(desc(avg_log2FC))
row.names(sub_df) <- sub_df$Geneid
sub_df <- sub_df[,"avg_log2FC", drop = FALSE]
row_annot <- rowAnnotation(foo = anno_mark(at = c(1:10,48:57),
                                           labels = row.names(sub_df)[c(1:10,48:57)],
                                           labels_gp = gpar(fontsize = 8.5)))
col_fun <- circlize::colorRamp2(c(-4, 0, 4), c("blue", "white", "red"))

# save
set.seed(1024)
heat_clt7 <- Heatmap(matrix = as.matrix(sub_df), name = "Average\nlog2FC", 
                     cluster_columns = FALSE, cluster_rows = FALSE, 
                     show_column_names = FALSE, col = col_fun, 
                     show_row_names = FALSE, 
                     right_annotation = row_annot)
set.seed(1024)
pdf(file = paste(dge_folder, paste0(cluster, "_dge_heatmap.pdf"), sep = "/"), 
    width = 3, height = 0.1 * nrow(sub_df))
print(heat_clt7)
dev.off()
## png 
##   2
# print
set.seed(1024)
print(heat_clt7)







Plot markers over overlay UMAPs


To plot markers it was used the integrated Seurat object with the Cre3 and Lox2 samples. The assay used was RNA and the slot data with the counts normalized with the LogNormalize method (read more).

The individual plots of the new markers highlighted through the integrated UMAPs can be found at: results/func_enrich_plots_09_06_21/plots/markers_all.

## Plot new markers

# folder to save plots
plot_mkr_folder <- "../results/func_enrich_plots_09_06_21/plots/markers_all"
if ( ! dir.exists(plot_mkr_folder) ) dir.create(plot_mkr_folder)

# check assay 
stopifnot(DefaultAssay(seu) == "RNA")
marker_plots <- list()
for ( mkr in markers_all$Genes ) { # loop over marker genes within a cluster
  stopifnot( mkr %in% row.names(seu) ) # stop if gene does not exist in the Seurat obj
  marker_plots[[mkr]] <- FeaturePlot(object = seu, features = mkr, 
                                     slot = "data", reduction = "umap", 
                                     pt.size = 0.001)
  ggsave(filename = paste(plot_mkr_folder, paste0(mkr, "_gene_UMAP.pdf"), sep = "/"), 
         plot = marker_plots[[mkr]], height = 5, width = 5)
}
cowplot::plot_grid(plotlist = marker_plots[1:9], ncol = 3)

cowplot::plot_grid(plotlist = marker_plots[10:18], ncol = 3)

cowplot::plot_grid(plotlist = marker_plots[19:27], ncol = 3)



Plot markers for cluster 8 by Lox2 UMAP


To plot cluster 8 markers it was used the integrated Seurat object with the Cre3 sample only. The assay used was RNA and the slot data with the counts normalized with the LogNormalize method (read more).

The individual plots of the new markers highlighted through the integrated UMAPs can be found at: results/func_enrich_plots_09_06_21/plots/markers_clt8.

## Plot markers for cluster 8 by Lox2 UMAP

# folder to save plots
plot_mkr_clt8_folder <- "../results/func_enrich_plots_09_06_21/plots/markers_clt8"
if ( ! dir.exists(plot_mkr_clt8_folder) ) dir.create(plot_mkr_clt8_folder)

# get Seurat Lox2 obj
cre3 <- SplitObject(object = seu, split.by = "orig.ident")
cre3 <- cre3$Cre3

# save plots
marker_plot_clt8 <- list()
for ( mkr in markers_clt8 ) {
  stopifnot( mkr %in% row.names(cre3) ) # stop if gene does not exist in the Seurat obj
  marker_plot_clt8[[mkr]] <- FeaturePlot(object = cre3, features = mkr, 
                                         slot = "data", reduction = "umap", 
                                         pt.size = 0.001)
  ggsave(filename = paste(plot_mkr_clt8_folder, paste0(mkr, "_gene_UMAP.pdf"), sep = "/"), 
         plot = marker_plot_clt8[[mkr]], height = 5, width = 5)
  
}

# plot all
cowplot::plot_grid(plotlist = marker_plot_clt8, ncol = 3)






Orthology conversion: P. chabaudi vs P. falciparum 3D7


Orthology search was done with the gprofiler2 R package (v.0.2.0) (Kolberg and Raudvere 2020), an interface to the g:Profiler web browser tool. The function gorth() was applied in order get orthologous genes of Plasmodium chabaudi chabaudi (v.PCHAS01; source_organism = "pchabaudi") against Plasmodium falciparum 3D7 (v.ASM276v2; target_organism = "pfalciparum"). The query gene list of P. chabaudi genes consisted in all the genes that are expressed in the Seurat integrated object (n=4697). Genes without correspondence were filtered out (filter_na = TRUE). One *P. chabaudi may have more than one orthologous gene (mthreshold = Inf). This conversion was done at 09/06/2021 using the archived version of the gprofiler2 server - Ensembl 102, Ensembl Genomes 49 (database built on 2020-12-15): https://biit.cs.ut.ee/gprofiler_archive3/e102_eg49_p15/gost.

## Convert orthologous genes between P. chabaudi *vs* P. falciparum 3D7

genes_exp <- row.names(seu)
genes_exp <- gsub(pattern = "PCHAS-", replacement = "PCHAS_", x = genes_exp)
set.seed(1024)
set_base_url("https://biit.cs.ut.ee/gprofiler_archive3/e102_eg49_p15")
orth_pchabaudi_genes <- gorth(query = genes_exp, source_organism = "pchabaudi", target_organism = "pfalciparum")
write.table(file = "../results/func_enrich_plots_09_06_21/tables/orthologous_genes_Pchabaudi_vs_Pfalciparum3D7.tsv", 
            x = orth_pchabaudi_genes, sep = "\t", row.names = FALSE, quote = FALSE)

Among the P. chabaudi 4697 genes queried, it was only found 4410 hits presented below.








Deliver

Folders inside the project folder:

  • results: folder that contains all the results obtained in this analysis;

  • report: folder that contains the report and code used herein;

  • data: folder that contains the data used herein;

  • scripts: folder that contains the scripts used herein;

  • info: folder that contains some useful information (i.e., papers, etc) used herein;






R packages used and respective versions

## R packages and versions used in these analyses

sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ComplexHeatmap_2.2.0 gprofiler2_0.2.0     readxl_1.3.1        
##  [4] tidyr_1.1.2          ggplot2_3.3.2        Matrix_1.3-2        
##  [7] gplots_3.1.1         SeuratObject_4.0.0   Seurat_4.0.0        
## [10] DT_0.16              dplyr_0.8.5         
## 
## loaded via a namespace (and not attached):
##   [1] Rtsne_0.15           colorspace_1.4-1     rjson_0.2.20        
##   [4] deldir_0.2-9         ellipsis_0.3.1       ggridges_0.5.3      
##   [7] circlize_0.4.12      base64enc_0.1-3      fs_1.5.0            
##  [10] GlobalOptions_0.1.2  clue_0.3-58          spatstat.data_1.7-0 
##  [13] farver_2.0.3         leiden_0.3.7         listenv_0.8.0       
##  [16] ggrepel_0.9.1        lubridate_1.7.9      codetools_0.2-18    
##  [19] splines_3.6.3        knitr_1.30           polyclip_1.10-0     
##  [22] jsonlite_1.7.2       bsplus_0.1.2         ica_1.0-2           
##  [25] cluster_2.1.2        png_0.1-7            uwot_0.1.10         
##  [28] shiny_1.6.0          sctransform_0.3.2    compiler_3.6.3      
##  [31] httr_1.4.2           assertthat_0.2.1     fastmap_1.1.0       
##  [34] lazyeval_0.2.2       later_1.1.0.1        htmltools_0.5.1.1   
##  [37] tools_3.6.3          igraph_1.2.6         gtable_0.3.0        
##  [40] glue_1.4.2           RANN_2.6.1           reshape2_1.4.4      
##  [43] Rcpp_1.0.6           spatstat_1.64-1      scattermore_0.7     
##  [46] cellranger_1.1.0     vctrs_0.3.6          nlme_3.1-152        
##  [49] crosstalk_1.1.0.1    lmtest_0.9-38        xfun_0.19           
##  [52] stringr_1.4.0        globals_0.14.0       mime_0.9            
##  [55] miniUI_0.1.1.1       lifecycle_0.2.0      irlba_2.3.3         
##  [58] gtools_3.8.2         goftest_1.2-2        future_1.21.0       
##  [61] klippy_0.0.0.9500    MASS_7.3-54          zoo_1.8-8           
##  [64] scales_1.1.1         promises_1.1.1       spatstat.utils_2.0-0
##  [67] parallel_3.6.3       RColorBrewer_1.1-2   yaml_2.2.1          
##  [70] reticulate_1.18      pbapply_1.4-3        gridExtra_2.3       
##  [73] downloadthis_0.2.1   rpart_4.1-15         stringi_1.5.3       
##  [76] caTools_1.18.0       shape_1.4.5          rlang_0.4.10        
##  [79] pkgconfig_2.0.3      matrixStats_0.57.0   bitops_1.0-6        
##  [82] evaluate_0.14        lattice_0.20-44      ROCR_1.0-11         
##  [85] purrr_0.3.4          tensor_1.5           labeling_0.4.2      
##  [88] patchwork_1.1.1      htmlwidgets_1.5.3    cowplot_1.1.1       
##  [91] tidyselect_1.1.0     parallelly_1.22.0    RcppAnnoy_0.0.18    
##  [94] plyr_1.8.6           magrittr_2.0.1       R6_2.5.0            
##  [97] generics_0.1.0       pillar_1.4.7         withr_2.4.1         
## [100] mgcv_1.8-35          fitdistrplus_1.1-3   RCurl_1.98-1.2      
## [103] survival_3.2-11      abind_1.4-5          tibble_3.0.6        
## [106] future.apply_1.7.0   crayon_1.3.4         KernSmooth_2.23-20  
## [109] plotly_4.9.3         rmarkdown_2.5        GetoptLong_1.0.5    
## [112] data.table_1.13.2    digest_0.6.27        xtable_1.8-4        
## [115] httpuv_1.5.5         munsell_0.5.0        viridisLite_0.3.0








References

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2020. Rmarkdown: Dynamic Documents for R. https://github.com/rstudio/rmarkdown.

Butler, Andrew, Paul Hoffman, Peter Smibert, Efthymia Papalexi, and Rahul Satija. 2018. “Integrating Single-Cell Transcriptomic Data Across Different Conditions, Technologies, and Species.” Nature Biotechnology 36: 411–20. https://doi.org/10.1038/nbt.4096.

Gu, Zuguang, Roland Eils, and Matthias Schlesner. 2016. “Complex Heatmaps Reveal Patterns and Correlations in Multidimensional Genomic Data.” Bioinformatics.

Hao, Yuhan, Stephanie Hao, Erica Andersen-Nissen, William M. Mauck III, Shiwei Zheng, Andrew Butler, Maddie J. Lee, et al. 2020. “Integrated Analysis of Multimodal Single-Cell Data.” bioRxiv. https://doi.org/10.1101/2020.10.12.335331.

Kolberg, Liis, and Uku Raudvere. 2020. Gprofiler2: Interface to the ’G:Profiler’ Toolset. https://CRAN.R-project.org/package=gprofiler2.

R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Satija, Rahul, Jeffrey A Farrell, David Gennert, Alexander F Schier, and Aviv Regev. 2015. “Spatial Reconstruction of Single-Cell Gene Expression Data.” Nature Biotechnology 33: 495–502. https://doi.org/10.1038/nbt.3192.

Stuart, Tim, Andrew Butler, Paul Hoffman, Christoph Hafemeister, Efthymia Papalexi, William M Mauck III, Yuhan Hao, Marlon Stoeckius, Peter Smibert, and Rahul Satija. 2019. “Comprehensive Integration of Single-Cell Data.” Cell 177: 1888–1902. https://doi.org/10.1016/j.cell.2019.05.031.

Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.

Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.